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Introduction 


Computer games are vast and many, however most computer games have 
something in common — they need a place to store all their important files like 
images, movies, and sounds. To do this, computer game developers typically 
store their data into a big archive file. 


There are many reasons for storing all your data files in one big archive, some 
reasons include reducing the number of files on a CD, hiding the data files to 
stop people hacking the game, and so that all data files can be accessed 
using a single data stream. 


However, the bad news for gamers is that there are almost as many different 
archives as there are different computer games — every game developer 
creates their own archive formats, and they even change their formats 
between games or departments in the company. 


This brings us to the focus of the tutorial — how to explore the archives and 
grab the files from within them. This tutorial will attempt to make it easy for 
anyone to explore a new format, with the aim of promoting game 
modifications and enhancements by the community. 


In the following pages, we will discuss the terms Game Resource Archives 
(GRAs) and Game Resource Archive Formats (GRAFs), common data types, 
and other definitions. From there, we will explain the fundamentals of cracking 
a file format, including the tools you use, and the patterns to look out for. 


Thanks for reading our guide; we wish you the best of luck in your 
exploration©. 
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What is a GRA? 


We should start this off by defining the definition of an archive. An archive is a 
file that usually stores many small individual data files inside it. The most well- 
known archive format that you will be familiar with is *.zip archive, which you 
should recognise as a way to package many files together into a single file. 


A GRA (Game Resource Archive) defines an archive that is used by a specific 


game, and contains resource or data files that are used within the game, such 
as images, sounds, scripts, and text. 
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What is a GRAF? 


GRAF (Game Resource Archive Format) describes the way a game archive is 
constructed, and in particular, how and where the files are found within the 
archive. These formats usually differ for each individual game, however 
occasionally a game developer will stick with a particular format for a few 
games of the same vintage. 


Programmers usually define their GRAFs according to the needs and 
structure of the game itself, i.e. the game engine. A common principle, 
however, is to use a format that can be quickly and easily be opened by the 
game, even though the files in the archive may be different types. For 
example, different levels in a game need different sounds and textures, so 
often the sounds and images for each level are packaged into the same 
archive, even though clearly images and sounds are handled differently. 


In addition, the actual resources used in a game change frequently as the 
game is being developed. To make it easy for the game to adapt to the 
changes, the archive format is structured to be ‘universal’ in their approach of 
storing files. In simpler terms, the archive needs to provide the game engine 
with a common and recognisable pattern of resource files, and a list of the 
files that the GRA archive contains. 
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Tools 


Hex Editors 


Hex editors are the standard class of program that you use if you wish to 
crack the formats of a newly encountered game. 


The generic hex editor displays the contents of any file in a similar way that a 
piece of text in a word processor would be shown to you: from left to right, line 
by line, all the way down to the last character. However a hex editor differs 
from a word processor in that it shows the file as hexadecimal numbers rather 
than letters. 


There are many hex editors available for free download over the Internet, 
however the one that we can fully recommend is Hex Workshop from 
Breakpoint Software. Hex Workshop will be the editor used in all the 
screenshots, and the one used in the examples. Hex Workshop includes 
some handy little functions, which is partly the reason why we recommend it, 
such as an easy to use hexadecimal calculator, lists of all data types that are 
at the current location of your cursor, bookmarking, and colour mapping. 
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Hex Workshop 


Examine Figure 1 carefully, as it shows the hex editor that will be used 
throughout this guide, Hex Workshop. After installation you can either start it 
and open any file you want using the File menu, alternatively (and this you will 
do most of the time) you can select to open files from the context menu in 
Windows Explorer. To do that, right click on any file in Explorer and select 
“Hex edit with Hex Workshop”. 

Once you have opened a file, you will be presented with a similar view of the 
file content as depicted in Figure 1. You can examine the file’s hexadecimal 
interpretation (A), or ASCII interpretation of the same bytes (B). The table to 
the far left shows the offsets of the lines shown. As you can see, we have just 
opened one of Doom 3's PKA files. We will later see that these are actually 
ZIP files. For now, you can see the file starts with the characters "PK". In this 
case, it is actually an ID tag that is used in all ZIP archives. PK stands for 
Phillip Katz, the author of the ZIP compression technique, who died at the age 
of 37 due to fatal effects of alcoholism in April 2000. The current cursor 
position is at offset 18 (see figure) and the Data Interpreter (C) shows the 
relative value at this file position. You can see it shows different data types 
(see Chapter 4) ranging from bytes to strings to binary values using the byte 
sequence starting at this offset. In the Figure, we have colour mapped and 
bookmarked (D) some areas of our interest. You can select any range of 
bytes in the file and bookmark or colour map it. Simply drag the cursor along 
your area of interest holding the left mouse button and right click the 
highlighted area to show the context menu from where you select the options. 
When you bookmark it, you can choose how the bookmark should be 
interpreted (value), and give a description. The bookmarks will be shown with 
their offset in the file and the size in bytes (length). This is a handy feature as 
you can click on any bookmark to quickly switch to that offset. When you 
leave Hex Workshop it will ask you if you want to save your colour map or 
bookmarks. Later, you can load these regardless of the file you have opened. 
Thus, if you have solved the puzzle of a GRAF, you can apply the bookmarks 
and colour mapping to other files that you expect to have the same format. 


Hex Workshop has another handy GoTo method, which you too can access 
from the context menu. There are a number of options. With the basic GoTo 
option you can type in an address (offset) in the file you wish to go to. Two 
more powerful options are accessed by selecting a range of bytes (preferably 
one that you expect to represent an offset value) and then right clicking it. You 
can then either GoTo the value of your selected range or GoTo that the 
address of your value plus the current offset of your variable. 

For instance, suppose you expect a 4-byte value (i.e. 32-bit) at offset 4 to 
represent the offset of a resource in a GRA. You can then opt to select the 4 
bytes from this position (i.e. byte 4, 5, 6, and 7) and jump to the location in the 
file this 32-bit number points to. On the other hand, it might be that your file 
has an ID tag (GRAIS) of 4 bytes (i.e. bytes 0-3, see also offsets) and the 
GRAF does not count the 4 bytes of this tag when referring to offsets of 
resources. Thus the number you are labelling as an offset may in fact be a 
relative offset, meaning that it is relative to some other offset (in this case the 
offset where the GRAIS ends). You can easily jump to this relative location 
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with Hex Workshop, as the other GoTo option you are offered is to jump to the 
location your value depicts + the offset of the currently selected area. You can 
see that this program can come in very handy indeed if you are looking to 
unravel the puzzle of new archive formats. 
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Figure 1. Layout of Hex Workshop. A. Hexadecimal representation of file 
content in sequential byte order. B. ASCII interpretation of file content. C. 
Interpretation of data at current cursor position. D. User window showing 
bookmarked areas in the file and the description given to these areas by the 
user. 
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Terms, Definitions, and Data Structures 


To understand the patterns and construction or archives, we must first 
introduce the concept of data structures. 


Files 


Files are typically dealt with as a series of bytes stored one after the other, 
and when combined form a representation of a piece of data. If you have a file 
that is 12 bytes in size, it indicates that the file contains 12 single bytes that 
can be read in a certain way to represent a value. File sizes start off at the 
preliminary byte, followed by kilobytes (KB, 1024 bytes) and megabytes (MB) 
that are generic terms representing thousands and millions of bytes 
respectively. 
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Bits 


When we talk about the basic structure of a file, we typically think in terms of 
bytes. However, the actual underlying file structure, at its absolute simplest, is 
a sequence of bits or binary values, but we usually don’t deal with this level of 
representation too often because binary values don’t represent anything 
meaningful in their singular state — they need to be grouped into sets of 8 bits 
to become a valid representation of data. 


A bit, or binary value, is the language of a computer, and thus the underlying 
structure of everything readable by a computer. A bit only has 2 possible 
values — 1 or 0 - thus it is obvious why they are limited in what they represent. 
However, when we take groups of bits and join them together, their values 
couple together into larger numbers that hold greater value. 


The number of grouped bits needed to represent basic data is 8 bits, more 
commonly known as a byte. A byte can hold any value between 0 and 255, 
much more than the 2 possible combinations available to a bit. 


So how do coupled bits represent a larger numerical value such as that of a 
byte? This is achieved quite easily by referring to each of the 8 bits as an 
increasing power of 2. If we take a look at a single bit, we can think of it as 
having either the value 1x2? or 0x2° — thus giving us the values 1 or 0. If we 
add a bit to the left, the power of the new bit is either 1x2! or Οχ2΄ — either 2 or 
0. By adding the values of these 2 bits together, you should be able to see 
that all possible combinations will give us the values O, 1, 2, or 3. This is 
shown in the table below. 


Bit 1 (2°) Bit 0 (2°) Value 
0 0 0 (0x21 + 0x29) 
0 1 1 (0x21 + 1x2°) 
1 0 2 (1x21 + 0x29) 
1 1 3 (1x27 + 1x2?) 


If we continue this pattern for the remaining 6 bits (to make a total of 8 bits ie 
1 byte) then we can provide powers up to 2’, and thus if all 8 bits had the 
value 1 and we added them together, we would end up with the number 255. 
Appendix A shows all possible values of a byte, and the 8 bits used to create 
the value. 
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Bytes 


As described above, a byte is composed of 8 bits, and can thus contain a 
value between 0 and 255. When you open a file in a hex editor, each value 
you see represents a single byte value. You will probably not see numbers 
between O and 255 representing the bytes in a file; rather you will most 
probably see hexadecimal representations of the values, ranging from OO to 
FF. For an explanation of hexadecimal numbering, see the Hexadecimal 
Numbering section below. 
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16-bit (2-byte) numbers 


From this point forward, we need to be careful what we call each of these data 
types. Why? Because each programming language uses different names for 
the same values. 


A 16-bit value is commonly known in older programming languages (C++ or 
earlier) as a word or an Integer. Newer programming languages, such as Java 
and the .NET languages, call it a Short. 


A 16-bit number is just as the name suggests, a number created by 16 bits in 
a row. To determine the value of the 16-bit number, we follow the same 
process as when we wanted to get the value of a byte. 

Each of the 16 bits that make up the 16-bit number represents a power of 2 — 
the leftmost bit represents 27? and the rightmost bit 2°. Just as with bytes, we 
just go through each bit and calculate the bitvalue x power. 


An example — lets say we have the following 16 bits... 
101111000001100 


Working from left to right, we get the value... 
1x2P40x2!^-1x2? «1x27 41x21 1x29 0x2? .. 0x2 3--0x2? 


If you work this out, you should end up with the number 24076. 


If all 16 bits had the value 1, you would end up with the number 65535 — 
therefore the value of a 16-bit number ranges between 0 and 65535. 
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32-bit (4-byte) numbers 


If you have been following us so far, you should be able to see just how you 
would calculate the value of a 32-bit number. A 32-bit number uses 32 bits to 
represent it, and therefore has a value between 0 and 4294967295. This is a 
very large range, and thus is the reason why most values used in archives are 
stored in groups of 32-bits. 


A 32-bit number in older programming languages is known as a dword or a 
Long. In newer languages, this is known as an Integer. 
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64-bit (8-byte) numbers 


As with the previous example, this is a number represented by 64 bits, and 
thus can contain any value between 0 and a massive 
18446744073709551615. However, they can also represent floating-point 
numbers. These range in value from 
-1.79769313486232E308 to -4.94065645841247E-324 for negative values 
and from 4.94065645841247E-324 to 1.79769313486232E308 for positive 
values. These numbers are very rarely used in GRAs, but will be becoming 
more popular as things like 64-bit processors take off. Programming 
languages can refer to these as Longs, Doubles and even Floats. 
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Strings 


One of the most common tasks performed on a computer is word processing, 
so naturally we need some way of representing text in a document. A piece of 
text in a document is called a String, which more formally means a sequence 
of characters. 


Although there are many languages in the world, and Spanish being the most 
spoken Latin based language, in Informatics the first Latin language used in 
the Western world is English. The English script consists of 52 letters (upper 
and lower case), 10 numbers, and about 30 symbols. Seeing as though this 
adds up to about 92, it seems quite logical that we can represent text as 
binary values in a byte (remembering a byte supports up to 255 different 
numbers). This is exactly what happens when you open a text document in a 
word processor — the word processor reads the bytes of the file and 
represents each byte value as a character. 


For example, when the word processor reads a byte with value 65, it displays 
the letter ‘A’. The byte value 100 represents the letter ‘d’. You can then see 
that each byte value between O and 255 represents a letter, number, or 
symbol in the English alphabet. It is for this reason that if you open a non-text 
file in a word processor, you will see a lot of different letters and symbols — the 
word processor is displaying each byte as if it is a letter — because it simply 
doesn't know that it isn't a text file. 


A character, therefore, is an 8-bit number that is represented as a readable 
English symbol. A group of characters written in sequence is known as a 
String, but only if it is intended to be read as English — if you see a group of 
English letters in a file that doesn’t make any sense, it may not be a String. 
Caution should be taken when discarding a string of characters. Nowadays 
many different countries sell mainstream programs, and more often than not 
you will find pieces of text that are in a different language. What could seem to 
the English speaking as gibberish might clearly be another language. For 
instance, these characters may represent oriental characters that will be 
translated into the right script if you have the software support. Japanese 
software is well-known and it is evident that they use their own language to 
describe files or events in their archives. Likewise, many titles come from 
Europe, so the strings could be in Russian, Finnish, German, Hungarian and 
so forth. 
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Hexadecimal Numbering 


Hexadecimal numbering represents a byte value in a convenient and compact 
way, and is the most common form of numbering used in a hex editor. 


Recall that a byte can contain a value between 0 and 256. To write this 
number in hexadecimal, we split the number into 2 values, each representing 
a power of 16, much like binary numbers represent powers of 2. 


Because we are talking about powers greater than 10, we need to add in 
some additional symbols to represent the numbers 10 through 15 with a 
single character. To do this, we use the letters A through F to represent the 
numbers 10 through 15. So, the letter C in hexadecimal represents the 
number 12. 


Now how do we write a number in power 16? As mentioned earlier, the byte 
value is split up into 2, with the first number representing 16! and the second 
number representing 16°. You may notice that this is similar to the way bits 
are joined together to form larger numbers. 


The second number of the pair can take any value between 0 and 16 (labelled 
O through F), where the value represents number x 16°. So, if the second 
number was 6, it would represent the number 6x16^ —the value 6. If the 
second number was B, it would similarly represent the value 11x16? — the 
value 11. 


The first number of the pair represents the value number x 161. So, if the first 
number was 2, it would represent the value 2x16" — the value 32. 


Let's look at a full example now. If we are given the hexadecimal value 1F, 
what does it represent? The number 1 means 1x16! and the F means 15x16. 
Added together, we get 16115, the value 31. Similarly, the hexadecimal 
number E3 represents 14x167+3x16°, the number 227. 


It should be clear to you now that we can represent a byte (values O through 


255) in the hexadecimal number system using the values 00 through FF. Here 
we print hexadecimals as &h «value». 
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Signed and Unsigned Numbers 


So by now you are clear as to how numbers are stored in files, and even how 
strings are stored, but what about negative numbers? Luckily, negative 
numbers are really easy. 


There are only 2 possible classes of numbers used here — either positive or 
negative. This maps perfectly with a single bit of value O or 1 respectively. 


Rather than add an extra bit to a number, we take the bit with the highest 
value and interpret it as a positive or negative sign rather than contributing to 
the number. For example, in an 8-bit number, you would count all the bits 
from 2? to 2°, and the value of the 2’ bit will determine whether the value it 
positive or negative. You should note that because the highest bit is being 
used for another purpose, it cannot be used as part of the number itself. This 
effectively cuts the possible values of the number in half. In our example, you 
would normally be able to have any value between 0 and 255; however with 
the negative bit we now have numbers between -127 and 127. Also note that 
0 and -0 both indicate the number 0. 


Here we need to introduce a way of knowing whether a number will be 
positive-only, or a positive/negative number. We therefore use the term 
signed to indicate that the highest bit is used as a sign, or the term unsigned 
indicating the number is positive. Therefore, if you are told a 16-bit number is 
unsigned, you will know the number ranges between 0 and 65535. However, 
if it was a signed 16-bit number, it would range between —32767 and 32767. 


Note that for archives, it is very rare that you would need to use signed 


numbers. Unless mentioned, you should assume all numbers used in archives 
and in this document are unsigned. 
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Big-Endian and Little-Endian 


If you paid close attention, you would have noticed that whenever we 
calculate a number, the bit with the highest value was always on the left, and 
the lowest value on the right. This is regarded as almost a standard today 
amongst PC users, however some files, programs, and computer systems 
decided it was better to read it the other way around (ie right-to-left instead of 
left-to-right). So once again, we need to define some terms so that people 
know what order we are talking about. 


Little-Endian order is the one we will be using in this document, and unless 
stated specifically you should assume that Little-Endian order is used in any 
file. The alternate is Big-Endian ordering. 

So let’s see an example. Take the following stream or 8 bits 


10001110 


If you have been following the document so far, you would quickly calculate 
the value of this 8-bit number as being 


1x2! + Ox2® + ... + 1x21 + Ox2? = 142 


This is an example of Little-Endian ordering. However, in Big-Endian ordering 
we need to read the number in the opposite direction 
1x29 + 0x2! 1... + 1x2° + 0x2’ = 113 


It is always important to read the numbers in the correct order, otherwise you 
will end up with numbers that are meaningless and incorrect. As mentioned, if 
you don’t know which order to use, assume Little-Endian ordering. We will use 
Little-Endian order for all examples in this document. 
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File Offsets 


One of the most fundamentals of format exploring is the concept of file offsets. 
A file offset is the position of a certain piece of data in a file, measured from 
the first byte of the file. However, as with most computer programming, we 
start our number counts at O, not at 1. Therefore, if we are at the very 
beginning of the file, before we read anything, we are at offset O. After we 
read 1 byte, we are at offset 1. Read another 6 bytes and we are at offset 7. 


If the concept is a little hard to grasp, think of an offset as being a bar that 

divides a file up byte-by-byte. If we are at the beginning of a file, offset O, we 

have a bar right at the beginning before the first byte 
10110001011011000001011110 


If we are at offset 3, we place the bar after the 3rd byte of the file, and before 
the 4th byte 


01110001011011000001011110 
Similarly, offset 16 places the bar after byte 16, and before byte 17 


01100010110110001001011110 
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Archive Patterns 


There are literally thousands of different archives out there, however most 
archives will conform to one of several basic formats. Here we present the 
basic archive patterns so you can help pick the basic archive type, and once 
you know that you can make progress faster. 


Note the graphical samples presented here list some fields. These fields are 
not fixed, and indeed may be totally different to your archive. You should use 
the samples supplied as a guide to the overall structure only, not as a specific 
guide for a format. 


Also note that in many cases, but not all, the first few bytes indicate the name 
of the format the current archive has. For instance, in the example in Figure 1, 
we opened a PK4 archive from Doom 3, and it’s first 2 bytes gave away the 
format the archive has: PKZIP (remember they read ‘PK’). We will call them 
game resource archive identity strings (GRAIS). The length of these strings 
may vary per archive investigated, and may also be completely absent. 
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Directory Archives 


Directory archives are by far the most common archive structure in use today. 
Directory archives work by storing a directory somewhere in the archive that 
outlines the offset of each file in the archive. 


Usually the directory is presented at the beginning of the archive (called a 
header), as in the example below, however occasionally the directory will be 
placed elsewhere in the archive (typically at the end and called a tail). If the 
directory is not at the start, there will be a field somewhere that gives you the 
position of the archive — after all the game itself needs to know where the 
directory is. The directory offset is usually a 4-byte field either somewhere in 
the archive header, or at the very end of the archive. 


Here is a sample graphic representation of the archive 


Archive Header 
4 - GRAIS (String) 
4 - Number of Files 
Directory 
File Entry 1 
4 - File Offset 
4 - File Size 
X - Filename 
File Entry 2 
4 - File Offset 
4 - File Size 
X - Filename 


File Entry n 

4 - File Offset 

4 - File Size 

X - Filename 
File Data 

File Data 1 

File Data 2 


File Data n 
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Tree Archives 


Tree archives are more complicated archives that attempt to store a complete 
directory tree structure inside a file. This is usually done by defining a group of 
directory entries, where each directory entry points to another directory entry, 
and so forth until you reach the folder that contains the files, which then 
progresses as in the directory archive. 


Here is a sample graphic representation of the archive 


Archive Header 
4 - GRAIS (String) 
4 - Number of Directories at Root 
4 - Number of Files 
Directory Entries 
Directory Entry 1 
X - Filename 
4 - Subdirectory Offset 
4 - Number of Files in Directory 
4 - Number of Subdirectories in Directory 
Directory Entry 2 
X - Filename 
4 - Subdirectory Offset 
4 - Number of Files in Directory 
4 - Number of Subdirectories in Directory 


Directory Entry n 

X - Filename 

4 - Subdirectory Offset 

4 - Number of Files in Directory 

4 - Number of Subdirectories in Directory 
File Entries 

File Entry 1 

4 - File Offset 

4 - File Size 

X - Filename 

File Entry 2 

4 - File Offset 

4 - File Size 

X - Filename 


File Entry n 

4 - File Offset 

4 - File Size 

X - Filename 
File Data 

File Data 1 

File Data 2 


File Data n 
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As these archives are quite difficult to explain, | will provide an example here. 
Let’s pretend we have 3 files in the directories specified below 


\data\sounds\snd1.wav 
\data\sounds\snd2.wav 
\data\images\temp\pic1.omp 


The following graphic shows the structure of the archive that contains these 3 
files. 


Archive Header 
4 - GRAIS (String) HEAD 
4 - Number of Directories at Root 1 
4 - Number of Files 3 
Directory Entries 
Directory Entry 1 
X - Filename data 
4 - Subdirectory Offset offset to Directory Entry 2 
4 - Number of Files in Directory 0 
4 - Number of Subdirectories in Directory 2 
Directory Entry 2 
X - Filename sounds 
4 - Subdirectory Offset offset to File Entry 1 
4 - Number of Files in Directory 2 
4 - Number of Subdirectories in Directory 0 
Directory Entry 3 
X - Filename images 
4 - Subdirectory Offset offset to Directory Entry 4 
4 - Number of Files in Directory 0 
4 - Number of Subdirectories in Directory 1 
Directory Entry 4 
X - Filename temp 
4 - Subdirectory Offset offset to File Entry 3 
4 - Number of Files in Directory 1 
4 - Number of Subdirectories in Directory 0 
File Entries 
File Entry 1 
4 - File Offset offset to File Data 1 
4 - File Size size of File Data 1 
X - Filename snd1.wav 
File Entry 2 
4 - File Offset offset to File Data 2 
4 - File Size size of File Data 2 
X - Filename snd2.wav 
File Entry 3 
4 - File Offset offset to File Data 3 
4 - File Size size of File Data 3 
X - Filename pic1.bmp 
File Data 
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File Data 1 
File Data 2 
File Data 3 


Let’s walk through the reading of this file. 


First we read the archive header and see that there is only 1 directory at the 
root. This lets us know that we now need to read a single directory entry. 


We read directory entry 1, called temp, and are told there are 2 
subdirectories, and the directory entries for these subdirectories start at a 
certain offset. 


So we skip to the offset of the subdirectories. For each of the 2 subdirectories, 
we need to read a directory entry. The first directory entry read is called 
sounds and there are 2 files in it. The second entry is images and there is a 
single subdirectory in it. 


So we jump to the sounds offset and read 2 file entries, namely the file 
entries 1 and 2. After we have read these, we jump back to the images offset 
and read 1 directory entry, called temp, which has 1 file in it. 


We jump forward into the temp offset and read the 1 file entry. 


Using this method, we can build up a complex directory tree. This type of 
archive is usually slightly smaller in size than the plain directory archive, 
however the compromise is that it takes longer to read because you are 
jumping all over the place. For this reason, and the fact that it is a very 
complex structure, only a rare few games use this type of structure. 
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Chunked Archives 


Chunked archives store their files one after the other, with each file containing 
a header giving information such as the size and type of the file. These 
archives, probably the simplest of all archives, are examined by reading the 
header of the file, skipping the file data, then repeating again for the remaining 
files until you reach the end of the archive. 


These archives are based on the Electronic Arts IFF85 standard that is used 
for many files including WAV audio and AVI video. 


Here is a sample graphic representation of the archive 


Archive Header 
4 - GRAIS (String) 
4 - Archive Size 
File Header 1 
4 - File Type (String) 
4 - File Size 
File Data 1 
File Header 2 
4 - File Type (String) 
4 - File Size 
File Data 2 


File Header n 

4 - File Type (String) 
4 - File Size 

File Data n 


The archive header typically contains a 4-byte GRAIS, and a 4-byte field 
indicating the size of the archive. 


The file header typically contains a 4-byte type tag, and a 4-byte file size field, 
however other fields may be included such as filenames and file IDs. 
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Split Chunk Archives 


These archives are similar to the chunked archive. However, each file is also 
split up into chunks. Each file chunk is the same size, which allows efficient 
use of buffers when reading the file. 


Here is a sample graphic representation of the archive 


Archive Header 
4 - GRAIS (String) 
4 - Archive Size 
File Header 1 
4 - File Type (String) 
4 - File Size 
4 - Number of Chunks 
4 - Chunk Size 
File Chunk 1 
File Chunk 2 


File Chunk n 
File Header 2 
4 - File Type (String) 
4 - File Size 
4 - Number of Chunks 
4 - Chunk Size 

File Chunk 1 

File Chunk 2 


File Chunk n 


File Header n 
4 - File Type (String) 
4 - File Size 
4 - Number of Chunks 
4 - Chunk Size 

File Chunk 1 

File Chunk 2 


File Chunk n 
Note that as each chunk is a fixed size, the chunk size field need only be 


specified once for each file. This field may even be specified in the archive 
header if the chunk size is fixed for all files. 
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External Directory Archives 


External directory archives have the same structure as the directory or tree 
archive. However, the directory data and the file data are stored in 2 separate 
files. Naturally, the file that contains the file data is very large, and the 


directory file very small. 


Here is a sample graphic representation of the archive, where the directory 


archive format is used 


file.dir 

Archive Header 

4 - GRAIS (String) 

Directory 
File Entry 1 
4 - File Offset 
4 - File Size 
X - Filename 
File Entry 2 
4 - File Offset 
4 - File Size 
X - Filename 


File Entry n 
4 - File Offset 
4 - File Size 
X - Filename 


file.arc 

File Data 
File Data 1 
File Data 2 


File Data n 


Note that the 2 files both have the same name, but different extensions. Also 
note that the extensions are not fixed to the extensions given in the example, 


they can be anything so long as the 2 files have different extensions. 
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Checking Your Results 


Common Types of Fields 


Archives can literally contain fields for just about any purpose; however it 
helps to know some of the more common fields as you then know what to look 
for. 


The following fields are very common in archives, so there is good probability 
that you will run into at least one of these fields. 


e File Size 

e File Offset 

e Number Of Files 
e GRAIS 


The following fields occur in some archives, but at significantly less probability 
compared to those listed above. 
First File Offset 

Archive Name 

Filename Offset 

Filename Directory Offset 
Total File Data Size 

Total Directory Size 

Archive Size 

Number Of Directories 
Directory Offset 

File Extension / Type 

File ID 

Archive Version 

Filename Length 
Decompressed File Size 
Checksum 

Timestamp 
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Validating Your Fields 


When you think you know what a field means, it is important to validate your 
findings. One of the simplest to check are the file offset and file size fields. If 
you are presented with these two fields, simply add the first file offset to the 
first file size and see that it matches the second file offset. Repeat this a few 
times, and you can pretty much guarantee that those 2 fields are correct. 


Another field that is easy to validate is the file offset field, if you choose a 
good archive. All you need to do is go to the offset for each file and see if it 
points you to a known file header. For example, if you open a sound archive 
then a good header to look for is RIFF as it indicates a *.wav sound file. 
Similarly, if opening a texture archive, look for common image headers such 
as BM (*.bmp), GIF (*.gif) and JFIF (*.jpg). So if you pick the right archive, you 
can see whether the file offset field is correct. 


If you think that an archive compresses its files, and you have found the file 
size field, try looking for a decompressed file size field for each entry — simply 
look for a field that is always a little larger than the size for the file. 


If you locate a directory in the archive, try to find a constant file entry size if 
possible. For example, if each file entry contains a file size field and a file 
offset field, then each file entry has a size of 8 bytes. Once you know this, 
work out the size of the directory by finding the offset to the end of the 
directory and subtracting the offset to the start of the directory. When you 
divide the directory size by the file entry size, you will be able to find out the 
number of files in the archive. This number may be stored in the archive 
somewhere, usually at the start of the archive, so look out for it. Otherwise, 
you can just perform the same calculation as you did above when you go to 
implement this format in a program. 
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Padding 


So you think you have determined the fields and their data type, but for some 
reason you can read a few of the entries and it just starts getting it all wrong. 
This is most probably due to padding, the technique of adding null bytes to the 
archive as a way of expanding the data to a fixed length. 


For example, some archive types allow you to have a filename which is of an 
arbitrary length, such as where the filename is stored followed by a null byte, 
then the directory continues. As you can see, each entry will have a different 
length depending on the length of the filename. This may not seem like a 
problem, but when using buffers to read a file the buffer may become mis- 
aligned. This is where padding comes in to play. 


Entries in a directory are commonly padded to multiples of 4 bytes. If the 
filename length is not a multiple of 4 bytes, a number of null bytes is added to 
the end of the filename to bring it to the correct size. These bytes are to be 
ignored; they are simply there for better reading of the file. 


Let’s say you have a filename of length 7. The next multiple of 4 bytes that 
occurs is the number 8 (4x2), so only 1 null byte needs to be added. Similarly 
a filename of length 13 will need to be padded to a length of 16 (4x4) so we 
need to add 3 null bytes. 


Some archives also like to pad out their files to multiples of a number, typically 
multiples of 2048 bytes. 
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Filename Patterns 


Filenames are a very important thing to find in an archive because it tells the 
public how to open and edit the file. Some archives store filenames, however 
many do not because they take up a lot of space to store, and normally the 
game does not use the names. Archives that do store filenames have plenty 
of choices concerning how to do just that. 


If filenames are not used by the game, sometimes the filenames are stored in 
their own directory, thus allowing the archive to be read quickly and efficiently 
by the game (by simply skipping the filename directory completely). For this 
structure, you are usually provided with the filename directory offset 
somewhere in the archive (typically at the start of the archive), so you know 
where to find the filenames. The filenames are usually stored in the archive 
one after the other, in the same order as the files are stored in the archive. 
Each filename is normally separated by a single null byte. 


If the filenames are not stored in a separate directory, you may be lucky to 
find the filenames stored with each file entry in the directory. Wherever the 
filename is stored, they fit into one of a few different formats. 


The most common format is the storage of the filename, followed by a single 
null byte. 


A slight variation on this format is that for buffering, the filename must fit a 
multiple of a certain size. For example, filenames are usually padded to a 
multiple of 4 bytes, where if the filename length is not a multiple of 4, null 
bytes are added to the end of the filename to make it up to the correct size. 
More information on this is found in the Padding section. 


Another possible format is that each filename is stored in a fixed number of 
bytes. If the filename is too short, the filename is expanded to the correct size 
by adding null bytes to the end of the filename. If the filename is too long, it is 
just cut off at the correct size. 


On really good archives, a field just before the filename will actually tell you 
the length of the filename. This field is usually either a 4-byte field, or just a 
single byte. This is really handy because then you know exactly how long the 
filename is. The filename may or may not have a null byte following it — if it 
does then the null byte is normally included in the filename length field. 


You may see that the overwhelming number of filename formats rely on the 
filename being followed by one or more null bytes. For this reason, we 
typically call a filename a null-terminated string, ie a string that continues until 
you reach a null character. We should note here that, for those of you that 
don't know, a null byte is a byte of value O. Also, a slight variation on these 
formats, some archives will use the character 32 (the space character) 
instead of the null byte. 
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Encryption and Compression 
7.1 The basics 


In many cases, programmers use different techniques to either protect their 
resources contained in GRAs, by encrypting them, or limit the size of the 
archive by compressing the resources before creation of the archive. 

There are many different methods to encrypt, which makes it hard to figure 
out. Almost any programmer can come up with his own routine that will be 
very difficult to crack just looking at the archive. We will still try to shed some 
light on this subject, as encrypted and compressed archives will become more 
commonplace in the future. There are many tools you can utilize to crack the 
codes, besides the obligatory hex editor. For instance, try to get a 
disassembler, such as WinDASM, and use it to open the game’s executable 
with. With the mentioned tool, you can easily show every text string that the 
game uses (such as archive filenames, but also resource names!). You will 
see later why this can come in very handy! Not only that, but if you are a 
sophisticated programmer, that knows how to interpret assembly code 
(*machinelanguage"), you can try to backtrack whatever it was the executable 
did to decrypt the code; more on that later as well. 


Paramount to understanding encryption is knowledge of bitwise operations. 
Bitwise operations can be regarded as simple logical steps where two bytes 
undergo a bit transformation into a resulting byte. The primary operations are 
And, Or and XOR (exclusive Or) What all of these do is change the bits in the 
first byte depending on the bits in the second byte. Other operations include 
NOT, SHL (shift-left) and SHR (shift-right) that act on a single byte. 


AND 


The AND operation sets a bit to true only if both operators (read BIT) are true 
like this: 


0 AND 1-0 
1ANDO-0 
0 AND 0-0 
1AND1-1 
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Example: 12 AND 123 


00001100 (12) 
01111011 (123) (AND) 
00001000 (8) 


OR 

The OR operation sets a bit to 0 only if both operators are 0. 
0ORO=0 

1ORO=1 

OOR1=1 

1OR1=1 

Example: 12 OR 123 


00001100 (12) 
01111011 (123) 
01111111 (127) 


XOR 


The Exclusive OR operation sets the resulting bit to 1 only if one of both 


operators is true. 


0 XOR 0-0 
1XORO=1 
0XOR1=1 
1XOR1=0 


Example: 12 XOR 123 
00001100 (12) 


01111011 (123) 
01110111 (119) 
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NOT 


The NOT operation is not really like the previous, in that it only applies to one 
byte. What it does is invert the bits in the byte: 


00001100 (12) 
11110011 (243) 


SHL 


The shift-left operation shifts the bits in a byte to the left, discarding the left- 
most bit and completing the bits with a zero at the right end. Basically, one 
shift-left is the same as n*2! (with n being the original byte). Two shift-lefts is 
n*2?, three n*2? etc. up to a maximum of 255 of course, after that bits are 
discarded. 


00110011 (51) 
01100110 (102) 


SHR 


The shift-right operation shifts the bits in a byte to the right, discarding the 
right-most bit and supplementing from the left with a O bit. This is the same as 
a rough division by two, rounded to the lowest value in case of discarding a 
bit. 


00110011 (51) 
00011001 (25) 


What can you do with these operations? Well, there are a lot of useful things 
you can do, like super-fast divisions by two, color inverting, but also turning 
specific bits on and off. Hardware, such as graphic cards, encodes specific 
functionality in bits, not in bytes. For example, suppose a screen may have a 
resolution of 640x480 pixels, but we only have screen memory for 4 banks of 
320x200 bytes. How can we let the graphic card know which pixel to show in 
what color. Our graphic card will agree to a resolution of 640x480, but only if 
we divide the bytes into two and do some bank switching. What we get is 
rows of 320 bytes that are each actually two pixel positions next to each other. 
Note that each pixel only has 4 bits to work with so the maximum color value 
is all 4 bits set (15). Thus, we have a resolution of 640x480x16. Okay, but 
suppose we want to set the 4" pixel on screen from the left at the top row to 
color 15? But we want to keep the 3“ pixel the way it is? We should address 
the 2™ byte in memory and only change the high part of the byte (the left 4 
bits). Now, we can't just say, byte2 = 15, because that would set the right part 
of our byte and clear the fourth pixel! We saw above that we have a way of 
keeping certain bits in a byte as they were. We need to use the OR operation, 
setting the first 4 bits of our color byte to O and the top 4-bits to the color we 
want the fourth pixel to have. This is easy! Remember shift-left (SHL)? We set 
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our color byte to 15, SHL it 4 times and then OR the 2™ byte in screen 
memory with the resulting color byte: 


1. color byte = 00001111 (15) 
2. SHLx4 = 11110000 (240) 


There, now suppose the second byte in screen memory is 178, then we do 
178 OR 240 to set our pixel of interest: 


10110010 (178) — note that the third pixel (the lower half) has 
color 0010 (2), and the fourth color 1011 (11). 

11110000 (240) OR 

11110010 (242) 


We did it! We changed the fourth pixel into color 15 and kept the third pixel in 
color 2. 


This was just one of many uses of the bitwise operations. 

Here it is important to realize you can do some hefty encrypting with this as 
well, especially if you combine them to good effect. The XOR operation is 
used many times in encryption techniques, so if you’re looking to decipher 
some archives, remember to include XOR in your way of thinking. 


7. 2 Encryption 


One can ask a very simple question: what is encryption? Basically, encryption 
is a way to mask the true nature of a script. Files are nothing more than 
scripts that use up to 256 unique characters (bytes!) to “write” a “story”. Every 
one of you has probably done some word puzzles like anagrams. Anagrams 
are just another, be it simple, way of encrypting a word or sentence. However, 
true encryption requires people to be able to reverse the encryption in a 
logical manner. Usually, this is not the case with anagrams. There’s no 
universal logic applied when people create anagrams. However, when 
programmers encrypt their files or parts of their archives (usually those parts 
that contain important information, such as resource names, offsets and 
sizes) they will want their game editor to be able to de-encrypt (decrypt) the 
archive. As said, there’s not just one way to encrypt and that makes it 
impossible to cover this subject to complete satisfaction. However, as most 
encryption techniques are purely logical in nature, the smart investigator can 
come a long way in determining the rudiments of the technique or even crack 
the code as a whole. How to proceed? 


First of all, you should be absolutely positive that a part of the archive you are 
looking at is indeed encrypted. There’s no point in trying to decrypt a pile of 
bytes that were not encrypted in the first place. So, (1) identify an encrypted 
block of bytes. If you are sure, try to find more examples of files that use this 
supposed technique. In other words, if you have discovered a putative but 
encrypted GRA, (2) try to find more and use them to compare specific parts; 
many games have more than one archive, but most use only one way of 
archiving. If you are positive that you are dealing with an archive and you can 
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identify some directory structure, you can also use this structure to compare 
the way individual (encrypted) directories entries are shown in the one 
archive. (3) Use a good disassembler to open the game’s executable to 
search for resource name strings. You might just find a lot of strings there. If 
you can identify a resource by name, and you have already figured out where 
in the archive this resource is saved, you have a good reference for your 
decrypting. Suppose you have stumbled on a GRAF that encrypts the names 
of the resources before saving this information in an archive. Upon 
examination of the executable you may have found a string 
“/textures/weapons/pistol.pcx”. Suppose furthermore that you already 
extracted some of the textures as nameless pictures. If you recognize the 
pistol texture from the game, you then have something to work with while 
decrypting, as you can compare the encrypted string with the decrypted. 


But how about the actual decrypting? Where to start? 

As said, there’s not just one method of encryption. One trick that will tell you 
exactly what is going on is to run the executable from a disassembler and 
reverse engineer the assembly code that takes care of the encryption. This 
requires substantial knowledge of assembly language, though, and may be a 
slow and largely unrewarding process. Nevertheless, if you can pinpoint the 
location of the encryption/decryption code this may help you understand parts 
of it anyway and may be very worthwhile in the long run. As your 
understanding of these assembly processes grow the more easily and 
speedily you will figure these things out! 


Pen and paper ready! 


Without the aide of reverse engineering it is still possible to find out how 
encryption works. Pen and paper are two very handy tools that can help you 
solve the puzzle. We will try to explain the process of unravelling the Painkiller 
.PAK string encryption technique. 


Painkiller Encryption 


When the developers of the game Painkiller first released a demo, fans 
quickly discovered that the game used adapted PKZip files to store the 
resources (so called .PAK archives). Apparently, the coders did not want fans 
to have access to the resources, because in the second demo and the first 
retail version, they used another, more difficult to hack method of storing 
resources. While they used straightforward properties such as resource size 
and resource offset variables, they encrypted the filename of the resources. In 
addition, they changed the method to compress the actual resources to Zlib 
compression. Quickly people had got the whole format explained, and could 
extract the resources, but for the encrypted filename. That proved the hardest 
puzzle to solve. 
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In the archive “scripts.pak” the first few encrypted filenames strings are shown 
in ASCII like this: 

CLHLCB.P[\XIM.m,) 

IOOOVA.[WVSJ/4j/&#_ 

iJRRYD.IRV&1/& (a8-« 


Or, in hexadecimals: 

63 4C 48 4C 43 42 1C 50 5B 5C 58 49 4D 2E 6D 2C 29 20 

6C 4F 4Ε AF 5C 41 1B 5B 57 56 53 4A 2F 34 6A 2F 26 23 

69 4A 52 52 59 44 16 49 52 5C 2A 26 31 2F 26 27 28 61 38 3D 3C 


The first thing you do is take a good look at what you see. Try to see what can 
be seen. That may sound cryptic, but it's true. 

In our case, what do we know? Well, we know that the strings we are 
examining represent some king of encrypted text. We also assume these 
strings are in fact filenames. Good. That gives us something to go on. In 
addition, you did not know this before, but we tell you now: the 32-bit (4-byte) 
value that precedes each string is exactly the size of the string in bytes, and 
thus it is probably save to assume that the characters from the encrypted 
string match in position the characters of the original string. This is very 
important. Because with this knowledge we can make other assumptions: 


Filenames usually have the following structure: 


DirectoryMdirectoryMilename. extension 
(the number of directories may vary) 


The extensions of filenames are usually 3 bytes in length. 
Like: "text.doc", where the “doc” is the extension. 


Right, now take a good look at the strings. You will notice that they all start 
with a byte somewhere in the &h60- range, followed by characters in the 
&h40-&h60 range, followed by a single byte in the &h10+ range. Wait a 
minute! The standard filename structure as shown above starts with a capital 
character and is then followed by small characters, followed by a backslash 
(e.g. Directory\). You should know that non-letter characters mostly have an 
ASCII-value (decimal value) which is far from letter-characters. Thus, we 
recognize this pattern in the encryption example: 

«capital»«small characters><backslash>...etc. Without knowing the original 
value of the letter characters, we can assume that the encrypted values 
&h1C, &h1B and &h16 in the subsequent strings are in fact V, or &h5C! We 
keep in mind that this may also be a forward slash though, ‘/’, or &h2F. 

Close examination of the other characters in the strings reveal that the last 
three characters of each string are all preceded by a byte value in the &h60+ 
range. In the standard filename structure the last three characters commonly 
represent the extension of the file, with the character before the extension 
always a ‘.’ or &h2E. So it's save to assume the following structure of the 
encrypted strings: 
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«Capital, small characters» V«small characters»"."«small characters» 
Or, in function: 

<Directory>’\'<filename>’.’<extension’. 

And applied to the example of encrypted string 1: 
<cLHLCB>’\’<P[\XIM>’.’<m,)> 


However, we still don't know what, for instance, “cLHLCB” stands for. 
Apparently there's some algorithm at work here. Remember that you have 
logical ways to alter existing bytes, using XOR, AND and others? Let's 
consider the often-used XOR. Could we use XOR to create such a string? 
First of all, you'd need to be able to "seed" the XOR encryption: Every string 
has a first letter, in our case the capital character. To get a XORed value for 
this capital, you need another byte to XOR it with. This is your "seed" value. 
Next, you perform some trick on the XOR value, before you use it to XOR the 
second character in your string, then you perform the trick on the XOR value 
again to XOR the third character and so forth. The crucial XOR thing is: by 
XORing a value by the same value it was XORed with, you retrieve the value 
that was XORed. In our case, a XOR method much like the one described 
was assumed, also because some people used a disassembler to look at the 
encryption code, and came across some XOR statements. And that really fits 
nicely, because why go to the trouble of encrypting your filenames, if you 
won't need to decrypt them. 

Thus, there must be some algorithm at work here which can be used in 
reverse as well: by applying the algorithm to the original text you will get the 
encrypted text, by applying the same algorithm to the encrypted text you will 
get the original text. 

XOR is one trick you can use to create just such an algorithm. We know that 
we have a "V (or ‘/’) and a '.' in our strings. Let's take the first string. We can 
find out what the value must have been that was used to XOR our ‘V or ‘.’ and 
resulted in the encrypted code for these characters &h1C and &h6D 
respectively. Simply XOR them with the original value! 


Tip: In Hex Workshop, you can select any range of bytes and XOR it with a 
value of your choice. Click on the "^" button at the top toolbar to do so. In this 
case, you would select a byte of interest and select an 8-bit unsigned value to 
XOR it with. The outcome would replace the selected byte. 


Thus &h1C XOR &h5C = &h40, &h6D XOR &2E = &h43. 
But we kept in mind that the proposed backslash may also be a forward slash. 
That would give &h1C XOR &h2F - &h33. 


Now, let's consider this difference in XOR value used. The distance between 
the back- or forward slash and the dot is 8 bytes. The difference in XOR value 
used is either &h43 -&h40 = &h3 (3) in case a backslash was used, or it is 
&h43 -&h33 = &h10 (16) in case of a forward slash. Could it be as simple as 
adding two to the XOR value (hence 16/8 bytes distance = 2) for each 
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subsequent XOR you do? We can test this by starting from the forward slash 
(if the theory has a chance we are dealing with the forward slash after all) and 
XORing the string characters to the last character in the string. Thus, we will 
use &h33 for the forward slash, then use &h35, &h37, &h39, &h3B, &h3D, 
&h3F, &h41, &h43 (our dot!), &h45, &h47 and &h49 respectively to XOR the 
next characters. 


.P[\XIM.m,) = /electro.ini 


So, it is indeed like that. Now we do this for the whole string (i.e. start from the 
forward slash with the XOR value of &h33 and subtract 2 from this value, 
XOR the characters all the way to the first): 


cLHLCB.P[\XIM.m,) = Decals/electro.ini 


So the first string is decrypted! But we still do not know how the “seed” XOR 
was calculated. The first character in our original string is the letter D (&h44). 
The encrypted value was &h65, thus the XOR used was &h27 (39). This is 
our seed value for the XOR method of encrypting the string. But how was it 
calculated? 


We should first check the other strings and see what seed value we obtain for 
those. The above-described method (pinpoint the encrypted forward slash, 
XOR it with &h2F, use the XOR value to XOR the next encrypted characters 
after subsequent additions of 2, and the previous encrypted characters after 
subsequent subtractions of 2) will give: 


IOOO\A.[WVSJ/4j/&#_ = Decals/molotov.ini 
iIJIRRYD.IR\*¥&1/&'(a8=< = Decals/rockethole. ini 


We find that the encryption of the second string was "seeded" with &h28 (40) 
and the third with &h2D (45). You should notice they are all somewhat in the 
same range. 

We assume that the program that wishes to decrypt the strings (like the 
developers' game editor for instance) can get its "seeds" based on variables 
from the archive, and this should be specific per string (or in effect per 
resource). Let's see what other variables we have. When you look at the 
whole GRAF of the Painkiller .PAK files you will see resource offsets and 
sizes, as well as the size in bytes of the filename strings, among other stuff. 
Now compare the sizes of the strings. The first is &h12 (18), the second also 
&h12 (18) and the third is &h15 (21). These variables you find as 32-bit values 
saved just before the encrypted string. Compare the "seeds" we found for 
each string. The first was &h27 (39), the second &h28 (40) and the third &h2D 
(45). Notice how the first and second string are equal in size and their "seed" 
is only a difference of 1. 

To start with, let's propose that the encryption method uses the string size 
variables to calculate the "seed". Strings 1 and 2 both have the same size, so 
in theory they should end up with the same "seed". However, string 2 has a 
"seed" value of 1 higher than string 1. Well, it is the second string after all, so 
perhaps the method will take into consideration the position of the resource in 
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the file (the first one is file 1, the second file 2 and so forth). When the seed 
would be calculated, the final value could be incremented with the position in 
the file. This way the difference between the seed value of string 1 and 2, 
having the same size value, would indeed be 1. This then implies that the 
calculated "seed" value of &h27 less the position in the file for the first string 
would actually be &h27-1=&h26 (38), for the second &h28-2 = &h26 (38) and 
the third &h2D-3=&h2A (42). 

Assuming the above is correct, then how can we obtain the “seed” value of 38 
for the first string? This is not easy, but we just try a number of ideas. The size 
variable for the first string is &h12 (18). If we do a shift left of this number 
(multiplication of 2) we get &h24 (36). This is rather close to 38, isn’t it? How 
about the third string? A shift left of &h15 gives &h2A (42). Hmm, this is 
exactly the seed value of the third string (not adding the position in the file)! 
Thus, if we calculate it this way for the first and second string we get a value 
that is 2 lower than what is should be, and for the third the difference between 
our calculation and what it should be is 0. Perhaps we are mistaken, and the 
method is different? Well, we are trying to understand, and obviously we 
haven't cracked it completely. We will still stay on track though and keep the 
shift left as it is rather close to the actual “seed” value. 

More information is needed at times like this, and it is advised to apply your 
proposed methods on many cases of whatever is encrypted. In our case, we 
must check more strings to map potential differences in shift left value of the 
size variable and the "seed" value. We will not show it here, but we'll present 
the number of possibilities that you will get if you do so. We find that our shift 
left of the size variable differs from the "seed" value in this range: 


shift left (size) — seed value - (-3, -2, -1, O, 1). 


So the maximum difference is 3 less than the "seed", or the other way, 1 more 
than the "seed". This does point to some kind of tabular method, although the 
range is not very symmetrical. More symmetrical would be a range of (-2, -1, 
0, 1, 2). Let's see if we can alter our method so we have values that differ in 
that more symmetrical range. What would be needed? Well, all range values 
would have to be incremented with 1, right? -3 + 1 = -2, -2 + 1 = -1 etc. 

For us it means that our shift left value of the size variable is one off. 
Remember that the common operation shift left shift all bits one to the left, 
discarding the left-most bit. However, there are also adaptations of the shift 
left method. First, there's rotate left that will rotate the bits one to the left, and 
the left-most bit will be rotated to the right-most bit. Thus, if the left-most bit is 
set it will transfer to the right-most bit. If that is the case it will result in a 
decimal calculation of value * 2 + 1. And that is what we need as well. 
However, if the left-most bit is not set, the right-most bit will simply be Ο. In our 
case of the size variables, the left-most bit is never set, so the rotate left 
method will not get us where we need to be. The second adaptation of the 
shift left method is inclusive shift left. This will shift the bits 1 to the left, but 
now set the right-most bit as well, instead of adding a zero-bit. To do that just 
shift the bits left and add 1! This method will get us where we want to be. 
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The method to calculate the “seed” so far is: 

Shift left (size variable) + 1 

And our “seed” value will differ from the real “seed” value in the range: 
{-2, -1, 0, 1, 2} 


Apparently, the encryption process has some way of telling when to add or 
subtract these values from the result of the shift left + 1 operation, as if looking 
them up from a table. How would this table look like, and how would it know 
where to look in the table? The only way to get a hint of this process is by 
examination of multiple strings and comparing the “seeds” with the size 
variable, as we will assume that the size variable is also needed to look up the 
variable from the unknown table (as string 1 and string 2 only have in 
common the same size, this shows that the encryption uses only this variable 
to encrypt). 


So, make a table from a size variable of O upwards. Look at the strings and 
find the seed, write down the range value it used (-2 or -1 and so forth). Well, 
perhaps you won't find strings that are O in length, but just fill in those that you 
do find. 


Thus the table will be a single dimensional table like: 


Size Code 

0 Code1 
1 Code2 
2 Code3 
3 Code4 


If you do this, you will discover the following table: 
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Size (hexadecimal) Code 


ππ σιωισ]Ὦ|ῶ[ου-ισ σι wn elo 
N 


Etc. 


Lets examine the first string again. It has a size of &h12. That means the 
code will be 1. The seed will be calculated like this then: 


(Shift left (&h12) + 1) 1 (from file 1) + 1 (Code) = &h27 ! 

The second string: 

(Shift left (&h12) + 1) + 2 (from file 1) + 1 (Code) = &h28 ! 

The third string: 

(Shift left (&h15) + 1 ) + 3 (from file 1) + -1 (Code) = &h2D ! 

These are the XOR values that will be used on the first character of the 
original string. For each subsequent character this value will be incremented 
by 2. The complete encryption algorithm is then: 

Painkiller .PAK string encryption: 

EncryptedStringCharacter (n) = 

OriginalStringCharacter(n) XOR (shift left (string size) + 1 + FileNumber + 


Code(string size) + n*2) 


(Where η starts at character position 0) 
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Encryption methods are with many, but so are your brain cells 


The Painkiller .PAK string encryption is just one of many ways to encrypt and 
there’s no universal tool or process to decrypt all of them. Many archives may 
encrypt whole resources, while some may use irreversible encryption 
techniques to address resource names (Microprose .CAT archives from 
Gunship! and others). As the uncovering of the Painkiller .PAK string 
encryption should show, there’s a lot of guessing and second-guessing 
needed to solve the puzzle, besides a logic mind. You should train yourself in 
recognizing logical patterns; think along lines of file structures, bytes and bits. 
By using a pen and paper you can write down notes, compare things more 
easily, write down binary values, and try different logical methods to get where 
you need to be. It really is helpful. In conclusion, the most important ability you 
need to have to solve encryption techniques is the ability to recognize logical 
patterns. Good luck! 


7. 3 Compression 


Like encryption, the subject of compression is one we will not be able to 
address to our full satisfaction. There are many ways to compress (see the 
appendix for links to websites that cover this subject) and different techniques 
are needed to effectively compress files of alternating type. 

However, we can point out some ways to discover the compression method 
used. 


ZLib compressed files 


ZLib (http://www.gzip.org/zlib/) is a free and open source project that is used 
in many games to compress files and archives, and is comparable in power to 
others such as RAR or PKZIP. The fact that it's free makes it used a lot, as 
there are no licensing deals needed. But how do you know if a file is ZLib 
compressed? 


Now, this is probably one of the easiest ones around to spot. Check out the 
screenshot of HW with an open .TRE file (the format from Star Wars ) 


eed x e ον d Ίππο ssie F 8 ος τ." «-»- 
< 


Sec » 1$ δ) $$ 9Σ - | & »  - e 7 X x hja ai a» Uh x »g 


FF 


Figure 2. ZLib compression technique can be identified by the ‘x’ as the first 
byte. 


In the figure, we have highlighted a piece that begins with the letter ‘x’ (&h78, 
120). When you encounter such a piece and you have reason to suspect an 
individual file begins at that location in an archive of your interest, you may 
wish to treat it as a ZLib compressed file. You do have to make sure though. 
Suppose you have figured out that a file starts at a certain offset in an archive, 
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and in addition you have found variables that tell you the size of this file. Now, 
just copy and paste the whole putative file into a new file and save it. The next 
part is tricky as you must have had some experience in unpacking ZLib files. 
What you need is a simple program that will unpack a standard ZLib- 
compressed file. In it, you must be able to point to the file and tell it to unpack 
it. It should then tell you if it was a success or not. Some of these programs 
will also require you to specify an “uncompressed” size (the original size of the 
compressed file). With luck, you have already found this variable in the 
archive as well. Remember to look out for a variable that is slightly greater 
than the “compressed size” variable when you think you are dealing with 
compressed files. You can either just create your own program that will try to 
unpack any given file using zlib.dll, or use existing ones. Check out the 
http://www.gzip.org/zlib/ website for more information. 


PKZip compressed files 


Many games nowadays use the ‘standard’ PKZip compression to compress 
individual files, and some even just pack all their files into an actual .ZIP file, 
now and then changing the file extension for their own purposes. Examples of 
the latter include Quake 3 .PK3 files, Thief 2 .CRF files and Fall-out Tactics 
.BOS files. Some however use the technique itself on individual files and pack 
them into an archive of their own format. This will not be easy to determine. 
You may get as far as knowing or at least assuming a resource is 
compressed, determining with certainty that it is PKZip-compressed is not 
possible without in depth investigation on the one hand, or a strategy of 
elimination on the other. This hold true for many if not all compression 
techniques you wish to identify. 


In depth? 


One option is to open the executable that is likely to process the GRA in a 
disassembler. This is extremely time-consuming however if you have not got 
a vast experience in assembly language. The trick is to trace the code that 
handles the uncompressing of resources and subsequently reverse engineer 
the method! Naturally, this is not the easiest of tasks. If you wish to go ahead, 
we recommend a disassembler such as WinDASM (no longer in development, 
but one of the best around, see if you can find it on the web). 


Eliminate them! 


Another strategy is to eliminate possible candidate-compression routines by 1. 
simply trying them on your file and see if they succeed or 2. comparing the 
structure of you file with those of files that were compressed using a certain 
technique. Eliminate all those you can find one by one using this strategy until 
you (hopefully) find the one you are looking for. You should increase your 
knowledge base on compression techniques as many a game-coder thinks its 
cool to use some obscure technique from the past. 
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Snoop around the executables for clues 


You can also use programs such as WinDASM or Hex Workshop to look 
around the games executables (e.g. .EXE, .COM, .DLL etc.) and see if you 
identify a piece of text that will tell you more. Sometimes you can find names 
of functions that are called by the program, or you can find clues in error 
messages that are saved in the executables. For instance, you could find a 
‘LZHUncompress’ function name, that may point to the .LZH compression 
technique, or you could find an error message ‘RAR: Error in CRC” that tells 
you .RAR was used. Likewise, some techniques require a licensing deal with 
the patent holders, so you should examine the credits of a game for a 
candidate technique (e.g. “Bink Video Compression’). 
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Worked Examples 


In this chapter we will go through the process of cracking a format step by 
step, from opening it for the first time in Hex Workshop to the final format 
presentation. To do this, we will cover a relatively easy format. In the 
appendix you will find more complete examples. Let's start off immediately, 
because the sooner you get to grips with it, the sooner you can start cracking 
your own! 


Quake *.PAK 


Games using the Quake (1 or 2) engine save their resources in archive with 
the .PAK format. How did we tell? Well, when we encounter a new game, and 
we want to know which files we have to examine we focus on A. large files 
and B. the title of the file. In the case of Quake we really find only one large 
file, pakO.pak, and the title and extension clearly indicate it may be some kind 
of package! We recommend that you download the free Quake2 demo, as the 
pakO.pak from the demo will be used in this tutorial. 


Open the archive and check out the basics: 


A. Any GRAIS visible? 
B. Determine the size of the archive 
C. Locate filename strings if possible 
= |f not at the start of the archive, go through the whole 
archive, starting from the back 


When we open the demo’s pakO.pak with Hex Workshop we see the following 
(Figure 2A). 


E 
re Ed Deb Opto Took Widow fe E IE] 
w ou e τν 5 εξ 8531978 wor 

See » «6 75 4*9. ££ xe- « » x iW Α' αἱ x^ (Δ x ef 


ΕΕΞΙΡΩΗ ΕΕ arar arar ator ata teas ΕΕ ΕΕΚ TETEE, FE 
Figure 3A. Start of Quake 2's pak0.pak file in Hex Workshop. 


The status bar at the bottom shown the size of the file at the bottom right: 
49951322 bytes! Also, you can see the cursor is at offset 0. Immediately you 
notice that the ASCII window shows something interesting: the first 4 bytes 
make up a plausible English word, “PACK”. Could this be our GRAIS? Quite 
possibly! We select the 4 bytes that make up the word and right-click the 
selection to get the context menu. Here we select Add Bookmark... and write 
down 'GRAIS' in the description field that we are shown. Before we click Ok 
we set the /nterpret data as... field to ‘string’, as we want HW to show the 
appropriate string value in the Results Window (see User's Bookmarks in 
Figure 1). Good, next we click Ok and you should see something similar to 
Figure 2B in the results window (by default bottom right). 


© 2004 Mr.Mouse and WATTO 47 


THE DEFINITIVE GUIDE TO EXPLORING FILE FORMATS 


Compare À Checksum A Find ) Bookmarks A Output 7 
Figure 3B. A bookmark in the Results Window of Hex Workshop. 


So now we have determined that the first 4 bytes represent the GRAIS and 
have book marked it accordingly. We could colour map it as well, to easily find 
it back on screen. It is up to you if you wish to do that. 


We continue to examine the following bytes. We need to find out if there are 
some filename strings. 


We can’t see any at the beginning of the file, can we? Apart from PACK we 
don’t see anything that even resembles a string. Then we remember that in 
many instances resource information, such as filenames, are saved at the 
back of the archive (in a Tail). 


Examine the end of the file for filename strings 


Simply hit the End key on your keyboard to show the end of the pakO.pak file. 
You should see something like Figure 2C. 


55 πα fae Deb Cotes Took Winder Μαρ miej 
saad A etev 5 wwe sler (0C — e. 
See » 56 7ος 4. fe xe- « » x iw aai T x eb 


sao E 


6E 2F76 SFéD 6163 686Ε].. νιν» 


E 747 m42 


Figure 3C. A snippet from the end of Quake 2's pakO.pak (demo version). 

Oh! Check out those strings! Something like 
models/weapons/v machn/skin.pcx is quite obviously a filename, no doubt 
about that. And there are many more. But that's not all. What else is there to 
notice? Well, let's take a closer look at the length of these strings. Just select 
one string (drag your mouse pointer from the first character of a string to the 
last while holding the left mouse button). 


e TIP: During selection, notice how the status bar at the bottom of HW 
shows a value for a variable named Sef:. This shows the size of the 
current selection in hexadecimals! As you drag, this value is increased or 
decreased when you select more or less characters respectively. 
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Did you select a string? Read the size of the selection (in effect, the string) in 
the status bar. Now select another string. Once again, read the size of it. After 
a couple of times you will notice that these strings have variable sizes. So, the 
filenames can be any size in this archive apparently. But would this also mean 
that we have resource entries in this putative tail that are of variable length? 
Let's check this. As most GRAFs go, filenames and other information such as 
file offsets and sizes are all saved together into individual entries (blocks). 
Logic would dictate that each variable would be saved at the same place in 
the block. So, for argument's sake we'll assume that the program that created 
the archive saves the filename of a resource first, and then other information 
about the resource. If this is so, then we can easily see wether these blocks 
are also of variable length. To do that, first put the cursor on the first character 
of a filename string. Now select every byte until you reach the first character 
of the following filename string. In Figure 2D we have done this. 


Hem Workshop - [dapak] 4 algiod 
re Ede Deb Optons Took Window Μαρ «πὶ κα 
wens 1 ne * v d εξ @Bst_erse wor 

£2 — «€ ιν $$209 $4 2} fe we - e^ x iw Α- αἱ me ΩΝ x ef 


Figure 3D. A block (entry) in the tail of Quake 2's pak0.pak is selected. 


We then read the hexadecimal size of this block: 0x40. Or in decimals: 64. 
When we select multiple entries, each time the size of the entry will be 0x40. 
This tells us that the size of each block is set, regardless of the filename 
string, which we have identified as being variable. 


Let’s go back to the start of the file. Hit Home on your keyboard. Before we 
can go on we need to know how any program that opens .PAK archives 
knows where to find these filename strings. Well, if it is truly a tail and not just 
some chunk of the last resource saved in the archive, there are usually two 
possibilities. First, it may be that the program knows the information is at the 
back, and perhaps even at a set offset from the end of the file. Second, the 
program does not know where to find the start of the tail and needs to be told. 
The variable that points to the address of the tail is for 99% found at the start 
of the archive. That’s why we jumped back to the start! Now let’s examine the 
bytes that come after the PACK string more closely. Set the cursor on the first 
byte after the ‘K’. Look at Figure 2E to compare. 


d mE 
E Fe εᾱ De Opt Took Window Me iix 
eae e να ΠΕ bist eres GOT 

S- ον 3 - | 6 νε«--» ii min ae Ὁ X να 
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Figure 3E. At the start of pakO.pak. The cursor is at offset 4. The Data 
Inspector show the relative (interpreted) values at this location. 


The bytes are, in sequential order, DA 1D F9 02 80 14 .... Hold on! We 
remember there are different data types, ranging from 8 bit to 64 bit values. 
Let's see what we get if we interpret the bytes at this offset as different data 
types. Look at your Data Inspector. Notice how the values differ per 
interpretation. Typically, the first four bytes ‘look’ like something valid. Why? 
Most variables in an GRA are 32-bit, and the most significant byte (the fourth 
or right-most byte) in our 32-bit type is a low value. Once you spent a lot of 
time on this type of puzzling, you will understand why a 4-byte sequence with 
a low-value fourth byte might raise your attention. 

In this case, if we interpret the value at offset 4 as a 32-bit data type, the value 
is 49880538 (the Long datatype). 

Now wait a minute, doesn't this value sound familiar? Yes, it does, as the size 
of the file is 49951322. Our /ong value is less than the size, but only just! Let's 
see if this might be a pointer to somewhere fun in the GRA. Select the 4 bytes 
that make up the /ong value. Notice how the Data Inspector only shows 32-bit 
or less values as you do that. Now, right-click the selected bytes and select 
GoTo->Offset ΟΧΟΖΕΘΊΡΡΑ. You have jumped to offset 49880538 (see Figure 
2F). 


„laid 
E Fe oA Deb otos Took Window Me «| κ 
ΓΣ ΙΕ La v "v 3 πο 8548 (0 c 

$$ — « » 65 05 1€ 9$ πε * - e" X iIqM ri 0 X eo 


E74 


Bingo! You see the cursor is at the first character of a filename string: 
env/unit1 rt.pcx. Scroll up to see that no other filename strings come before 
that one. You have found a pointer to a tail! Good work! 

And as we saw before, this tail runs all the way to the end of the file. Let's 
calculate the tail size by subtracting the tail pointer from the size of the 
archive. 


e Tip: Go to the Tools menu and select Hex Calculator. This is a handy tool 
that will let you do basic maths, and convert decimals to hexadecimals and 
back. 


The tail size is 70784. Remember that we have previously shown that each 
entry the tail is Ox40 or 64 bytes in length. Thus we can also calculate the 
number of files in the file: 70784 / 64 = 1106! This may come in handy if we 
wish to search for variables such as number of files in the GRA. 
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For now, let's go back to the tail pointer. Hit Home. You may bookmark and/or 
colour map the 4 bytes that represent the tail pointer if you wish. Then, set the 
cursor on the offset to the right of the tail pointer (offset 8). Cast your eyes on 
the Data Inspector. If we interpret the value as a 32-bit long we get 70784! We 
have found the tail size variable in the GRA. 

The next 32-bit variable at offset 12 is too large to point to anywhere in the 
file. We can check out some more offsets, to see if there's a 32-bit (or 16-bit) 
value of 1106 somewhere at the start. After a while, we conclude there's 
none. 


Up to now we have identified a header in the pakO.pak (the GRAIS at offset 0, 
the tail pointer at offset 4 and the tail size at offset 8). 12 bytes in total. 
Furthermore, we have found that the tail consists of 64-byte entries or blocks 
with the first variable in these blocks a filename string of variable length. 


We still need to do some more investigating of the tail. The tail might reveal 
other important information about the resources saved in the pakO.pak file. 
Let's go to the start of the tail again (Figure 2F). See how the first filename 
entry (env/unit1 rt.pcx) is followed by a chunk of O-value bytes. Apparently, 
there's some Padding going on. Let's follow the trail of O's all the way to the 
second filename string. The trail is broken 56 bytes from the start of the first 
filename string and 8 bytes before the second filename string. You can see a 
byte value of OxOC (12) at this offset, followed by three O-bytes. 8 bytes could 
make up two 32-bit values, so let's see what we get if we interpret these 8 
bytes like that. The first long value is 12, the second /ong value is 23086. 
Intuition will tell you that the first may very well be a pointer to the start of the 
resource (resource offset) as it is A. very low and B. we did not find any 
recognizable variables starting from offset 12 upward. The second long in our 
tail entry might then represent the size of the resource. 

We can check all of this easily, especially if we are dealing with an archive 
that saves entries in the tail in the same order that the actual resources are 
saved in the archive. First we must make sure we find some /ong values at 
the end of the following entries in the tail as well. This is indeed the case. 
Second, we set the cursor at a position 8 bytes before the second filename 
string (env/unit1 rt.tga). The two long variables at this offset are 23098 for the 
putative resource offset and 196626 for the resource size. Now, subtract the 
first resource offset from the second : 23098 —12 = 23086. See how this is 
exactly our first resource size? We have solved the puzzle! Of course, we will 
check a bit further up-'stream' to make sure we aren't fooled by coincidence. 
Having done that we are sure about the following GRAF for pakO.pak: 
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id Software .PAK (Quake engine) 


General outline: 


Offset Type Description 

0-3 String GRAIS (‘PACK’) 

4-7 32-bit (long) Tail pointer 

8-11 32-bit (long) Tail size 

12-Tail pointer Resource data Resource Data 

Tail pointer-End of file N Blocks of 64 bytes Resource Information 


Tail (Single entry): 


Relative offset Type Description 
0-55 Null-terminated string, | Resource filename 
O-padded to size of 56 
bytes 
56-59 32-bit (long) Resource offset 
60-63 32-bit (long) Resource size 


If we would want to read from a .PAK file, we could just load tail entries until 
we reach the end of the file. If we would like to know just how many resources 
there are in a .PAK file, we simply read the value of the tail size and divide it 
by 64! This may be useful if you wish to reserve memory before you read tail 
entries. 


There, we've cracked it! 
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Appendix 


9. 


> Byte Number Table 


inary 


B 


A 


Value 


10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 


41 


42 


43 


44 
45 


46 


47 


48 


49 


50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 


Bit 0 


(2°) 


Bit 1 
(2) 


Bit 2 


(25 


Bit 3 


(2) 


Bit 4 


(2) 


Bit 5 


(2°) 


Bit 6 


(2°) 


Bit 7 


(2) 
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63 
64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
7T 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 
90 
91 
92 
93 
94 
95 
96 
97 
98 
99 
100 
101 
102 
103 
104 
105 
106 
107 
108 
109 
110 
111 
112 
113 
114 
115 
116 
117 
118 
119 
120 
121 
122 
123 
124 
125 
126 
127 
128 
129 
130 
131 
132 
133 


54. 


© 2004 Mr.Mouse and WATTO 


THE DEFINITIVE GUIDE TO EXPLORING FILE FORMATS 


134 
135 
136 
137 
138 
139 
140 
141 
142 
143 


144 
145 
146 
147 
148 
149 
150 
151 
152 
153 


154 
155 
156 
157 
158 
159 
160 
161 
162 
163 
164 
165 
166 
167 
168 
169 
170 
171 
172 
173 
174 
175 
176 
177 
178 
179 
180 
181 
182 
183 
184 
185 
186 
187 
188 
189 
190 
191 
192 
193 
194 
195 
196 
197 
198 
199 
200 
201 
202 
203 
204 


55 


© 2004 Mr.Mouse and WATTO 


THE DEFINITIVE GUIDE TO EXPLORING FILE FORMATS 


205 
206 
207 
208 
209 
210 
211 
212 
213 
214 
215 
216 
217 
218 
219 
220 
221 


222 
223 
224 
225 
226 
227 
228 
229 
230 
231 
232 
233 
234 
235 
236 
237 
238 
239 
240 
241 
242 
243 
244 
245 
246 
247 
248 
249 
250 
251 
252 
253 
254 
255 
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B. American Standard Code for Information Interchange 
(ASCII) Table 


1. Standard 


Start of heading 
Start of text 
End of text 
End of transmit 
Enquiry 
Acknowledge 
Audible bell 
Backspace 
Horizontal tab 
Line feed 
Vertical tab 
Form feed 


Ὁ σοὀ δω ο - ο 


π.μ 
e ο 


Carriage return 
Shift out 

Shift in 

Data link escape 
Device control 1 
Device control 2 
Device control 3 
Device control 4 
Neg. acknowledge 
Synchronous idle 
End trans. block 
Cancel 

End of medium 
Substitution 
Escape 

File separator 


ο CO -Jo (πη 9 t9 Hn occ 


a 
b 
c 
d 
e 
f 
g 
h 
i 
3 
k 
al 
m 
n 
o 
p 
d 
r 
8 
t 
u 
v 
w 
x 
Y 
z 
{ 
| 
} 


Group separator 
Record separator 


Σ KON αι αι 4 υπ οσοἙἝβῦΏ πο η πῷ "πως 


ὲ 


ΓΙ 


Unit separator 
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2. Extended 


9. ϱ: gg» m gi νη 


gsm rit 


E M M Π» νη ho 
IF πι 


| πο 


Pa ἢ tu pe pé ee 
E I-3F 


ο. ο © 


C+ Sa m r Fa 


Se eS SS SI = — ES 


"€ Pow α ο ο uk £o o£ 
E- "E H- 


- 


Tables by Alain Courteau, Drummondville, Canada 
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α 
b 
r 
π 
Σ 
a 
u 
τ 
$ 
® 
Q 
δ 
eo 
αι 
ε 
Π 


t πο πε WA MY H ill 


ἃ 
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C. Format of some common Game Archives 


Halo *.MAP 


General outline 


Positon(@ytes) __ Type Description 
Long | | |  JVersion: This format only applies to versions 1 and 2. 


Long 

EL — — : Long TailPointer Filenames: The pointer to the Filenames Tail 

(an array of null-terminated strings) 
Long TailPointer Fileinformation: The pointer to the 

FileInformation Tail (each entry for a file is total 12 bytes in 
length) 

12-15 Long Archive ItemCount: The number of items or files stored 
in the archive 


16 - TailPointer Filenames Da Resources: The files in the archive 


ta 
TailPointer Filenames - Array of Filenames Tail The names of the files in the archive 
|TailPointer Fileinformation 


TailPointer_Fileinformation - EOF |Array of FileInformation Tail The directory of files 


Filenames Tail (Archive ItemCount) 


Position(Bytes) 


Ὁ - NULL null-terminated string Item Filename: The name of the file 


Filelnformation Tail (Archive ItemCount) 


Position(Bytes) 
0-3 Pointer_Filename: A relative pointer to the start of the null-terminated filename 
for this file. This is relative to the Tai/Pointer_Filenames offset 


4-7 tong Jtem Size: The size of the file 
8-11 [059 Jtem Offset The offset to the file 


Remarks: 

Note that not all *.map files use this format. This is because Halo uses *.map 
files for 2 purposes: archives, and actual maps. Therefore it is essential that 
you check the version field equals either 1 or 2. 
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Lock On, Midtown Madness 3 *.CDDS 


General outline 


Position(Bytes) 
Ὁ - DDSInfo Offset Header ^ Header: The header of the archive 


DDSInfo Offset - DDSMIPInfo Offset DDSInfo DDSInfo: Information on the DDS 
DDSMIPInfo Offset - DDSDataObjects Offset DDSMipinfo DDSMipInfo: Information on DDS mipmaps 
DDSDataObjects Offset - EOF DDSDataObjects DDSDataObjects: The actual DDS data. 


Header 


ET — — om — eve emow the nar impres fe ace 


DDSInfo (Archive ItemCount) 


Item HighsizeCount: | haven't quite figured out the logic of this yet, but this is the number of MIPs 
in the current collection that are greater than 2048 bytes in size. These also start at a different 
44 - 47 Long _ joffset. Generally, the small files are collected at the front of the file, and those larger than 2048 
come after the smaller files. Possible something to do with speed of the game, loading of the bigger 


ones in memory for quick display, while the smaller ones may be read from the file? 


Item MiPInfoOffset: The relative offset of the DDSMipinfo entry for the current collection. The 
offset is relative from this position in the archive. 


DDSMipinfo (Archive ItemCount, Item MIPCount) 


Position(Bytes) 
0-3 [Long ΜΙΡ RelativeOffset: The relative offset to the current MIP data chunk 
4-7  |ong MP Size: The size of the MIP in bytes (the same as MIP Width * MIP Height) 


8-11 [Long ΜΙΡ Heigth: The height of the current MIP (in pixels) 
12-15 Long |MIP Width: The width of the current MIP (in pixels) 
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Remarks: 


Note again that the MIPs that are in the CDDS are DDS files, without the 128- 
bytes DDS-header. To be able to view them you need to insert the header 
before the MIP and save it as a new DDS file. Some knowledge of the DDS 
header structure is needed, but can be looked up on the internet. One way to 
do it is to rip a header from an existing DDS file (Saved in the same surface 
format, e.g. "DXT5") and insert that before the MIP each time you wish to 
save. Make sure you change the height and width variables in the DDS- 
header (located at byte 13 and 17 respectively) to their corresponding MIP 
values before you save the file. 


Note that the collection of MIPs for each texture start with the MIP in the 
original dimensions, for example 256 x 64 and decrease in size by half for 
each subsequent MIP. Thus, you will find that the order of the MIPs in the 
example will be 256x64, 128x32, 64x16, 32x8, 16x4, 8x2, 4x1, 2x1, 1x1. So 
the number of MIPs present in each collection is determined by the point 
where the dimensions of the texture are 1x1. The size of the MIP naturally 
becomes 4 times smaller each time, until the size of 16 is reached. So, 
although 4x1 = 4, the size will still be 16 bytes. 
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Sacrifice *.WAD 


General outline 


0-3 [Swing Healer erfcaTon of a WADs- — — — — | 
er hoo rato Poneto rag 


pem Unknown meaning 
a - Tail pag. πο. The data of each file 


: Tail The directory. Note that it is Zlib compressed, so you will need to decompress the directory 
Tail - EOF Lp 
(compressed) |before reading it. 


Position(Bytes) Type Description 
0-3 Sting | Filename: The name of the file. This string is reversed 
4-7 String E ro Pe The type of the file. This string is reversed 


FolderDivergance: Has a value of either 0 or 1. The value 1 indicates the start of a new folder. 
The value 0 indicates a file in the current folder. 
24-27 Unknown|Unknown: Unknown meaning 
Remarks: 


Note that there is no entry in the tail that represents the offset of each file in 
the archive. You will have to take the position after the "unknown" long in the 
header (the first bytes) as the offset of the first file in the list. Subsequent 
offsets of next files can then be calculated by adding the size of the previous 
file to the offset of the previous file. 


Note that the name of the items is always reversed (for some obscure reason) 
and that the first 4 bytes in the name are actually a string representing the 
type of the item (e.g. WAD!, FLDR, TEXT, SAMP), and the second 4 bytes 
are the name of the item. Note that SAMP files (e.g. in sounds.wad) are 
actually RIFF (*. WAV) files, preceded with their own 32 bytes header. 
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D: Useful References 


ο XeNTax (http://www.xentax.com) 


The website of co-author Mike Zuurman, and home to MultiEx Commander. 
MultiEX commander is a Windows-based program that can open and 
manipulate many hundreds of game archives, through use of its own 
specialist scripting language. 


ο WATTO Studios (http://www.watto.org) 

The website of co-author WATTO, and home to Game Extractor. Game 
Extractor is a game archive viewer/editor that can be run on any platform, and 
supports a host of different game formats. 

ο  Wotsit (http://www.wotsit.org) 

A huge collection of file format specifications. The specifications are often 
taken from the company developing or maintaining the format, so the material 


is reliable. Contains specifications for all types of files including sounds, 
images, text, archives, and executables. 
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E: Common File Format Tags 


Tag Extension Type Description 
BM * bmp *.dib Image Microsoft-standard Bitmap image 
(http://www.microsoft.com) 
CWS * swf Animation | Macromedia Flash animation (Compressed) 
(http://www.macromedia.com/flash) 
FWS * swf Animation | Macromedia Flash animation (Uncompressed) 
(http://www.macromedia.com/flash) 
GIF * gif Image GIF image 
(writing GIF images requires a license from CompuServe) 
MSCF * cab Archive | Microsoft Cabinet archive 
(http://www.microsoft. com) 
MZ * exe *.dll | Executable | Windows executable program application 
(http://www.microsoft. com) 
9oPDF * pdf Document | Standard Adobe PDF document / e-book 
(http://www.adobe.com) 
PK * Zip *.gz Archive | Standard ZIP/GZip archive 
(http://www. pkzip.com) 
Rar! * rar Archive | RAR Archive 
(http://www.rarsoft.com) 
RIFF * wav Audio Microsoft-standard audio file 


(http://www. microsoft.com) 
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Legal Information 


This guide aims to help programmers gain an understanding and appreciation 
of the various file formats in use today. Using this knowledge will help 
programmers develop their own formats, and increase support for different file 
formats in their own programs. 


This guide supports the exploration of file formats that are open-source or 
standards. We encourage the exploration of any file, so long as the 
exploration is for your own benefit only, and will not be used or distributed in 
an attempt to do anything illegal, including hacking of files, bypassing legal or 
copyright measures introduced into a file, or for use against a company or 
individual. If your exploration is for your own benefit and use, we fully support 
you — there is nothing illegal about exploring the files on your own computer in 
your own access. If you do wish to use the information you have gained 
through exploration, make sure you check for any licensing issues, 
trademarks, or copyrights that may be associated with the format — otherwise 
you could end up with major fines and criminal charges. 


This guide, and the authors, do not encourage or support the exploration of 
copyright or otherwise protected material for any purpose. The reading of this 
material does not grant you permission to modify or distribute information 
contained in any file that is not of your own creation. 


This guide, and the authors, do not support and are not affiliated with any 
game, program, company, copyright, or trademark that is used within. All 
copyrights, trademarks, and similar rights are used for identification purposes 
only. All rights reserved. 
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